Efficient Management of Geographically Distributed Big Data on Clouds
نویسندگان
چکیده
Nowadays cloud infrastructures allow storing and processing increasing amounts of scientific data. However, most of the existing large scale data management frameworks are based on the assumption that users deploy their data-intensive applications in single data center, few of them focus on the inter data centers data flows. Managing data across geographically distributed data centers is not trivial as it involves high and variable latencies among sites which come at a high monetary cost. In this report, we introduce an uniform data management system for disseminating scientific data across geographically distributed sites. Our solution is environment-aware, as it monitors and models the global cloud infrastructure, and offers predictable data handling performances for transfer cost and time. In terms of efficiency, it leverages for applications the possibility to set the tradeoff to be done between money and time and optimizes the transfer strategy accordingly. A prototype of our system has been implemented in the Windows Azure Cloud, and we obtain some encouraging results from the extensive evaluations.
منابع مشابه
A Survey on Geographically Distributed Big-Data Processing using MapReduce
Hadoop and Spark are widely used distributed processing frameworks for large-scale data processing in an efficient and fault-tolerant manner on private or public clouds. These big-data processing systems are extensively used by many industries, e.g., Google, Facebook, and Amazon, for solving a large class of problems, e.g., search, clustering, log analysis, different types of join operations, m...
متن کاملAn Architecture for Security and Protection of Big Data
The issue of online privacy and security is a challenging subject, as it concerns the privacy of data that are increasingly more accessible via the internet. In other words, people who intend to access the private information of other users can do so more efficiently over the internet. This study is an attempt to address the privacy issue of distributed big data in the context of cloud computin...
متن کاملEnergy-efficient Analytics for Geographically Distributed Big Data
Big data analytics on geographically distributed datasets (across data centers or clusters) has been attracting increasing interests in both academia and industry, posing significant complications for system and algorithm design. In this article, we systematically investigate the geo-distributed big-data analytics framework by analyzing the fine-grained paradigm and the key design principles. W...
متن کاملData Locality-Aware Big Data Query Evaluation in Distributed Clouds
With more and more businesses and organizations outsourcing their IT services to distributed clouds for cost savings, historical and operational data generated by the services have been growing exponentially. The generated data that are referred to as big data, stored at different geographic datacenters, now become an invaluable asset to these businesses and organizations, as they can make use ...
متن کاملOptimized Contract-based Model for Resource Allocation in Federated Geo-distributed Clouds
In the era of Big Data, with data growing massively in scale and velocity, cloud computing and its pay-as-you-go model continues to provide significant cost benefits and a seamless service delivery model for cloud consumers. The evolution of small-scale and large-scale geo-distributed datacenters operated and managed by individual Cloud Service Providers (CSPs) raises new challenges in terms of...
متن کامل